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Abstract:  In  signal  processing,  high  resolution  sig¬ 
nal  parameter  estimation  is  a  significant  problem.  In 
particular  the  estimation  of  the  direction  of  the  nar¬ 
row  band  signals  emitted  by  multiple  sources  received 
wide  applications  recently  in  signal  processing  litera¬ 
ture.  Quite  a  number  of  papers  appeared  in  the  last 
twenty  five  years  regarding  the  estimation  of  the  pa¬ 
rameters  of  the  direction  of  arrival  of  signals,  but  not 
that  much  attention  has  been  given  in  estimating  the 
number  of  signals.  In  this  paper  we  develop  a  method 
using  penalty  function  technique.  But  instead  of  using 
any  fixed  penalty  function  like  AIC  or  MDL,  a  class 
of  penalty  functions  satisfying  some  special  properties 
have  been  used.  We  prove  that  any  penalty  function 
from  that  particular  class  will  produce  consistent  esti¬ 
mates  under  the  assumptions  that  the  error  random 
variables  are  independent  and  identically  distributed 
with  mean  zero  and  finite  variance.  We  also  obtain 
the  probabilities  of  wrong  detection  for  any  particular 
penalty  function  and  estimate  it  using  the  matrix  per¬ 
turbation  technique.  It  gives  some  idea  to  choose  the 
proper  penalty  function  for  any  particular  model.  Sim¬ 
ulations  are  performed  to  verify  the  usefulness  of  the 
analysis  and  to  compare  our  method  with  the  existing 
ones. 
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1.  INTRODUCTION: 


Detecting  the  number  of  signals  and  estimating  the 
parameters  of  the  signals  are  important  problems  in  sig¬ 
nal  processing.  There  has  been  a  great  deal  of  recent 


interests  in  the  use  of  signal  subspace  processing  meth¬ 
ods  for  the  estimation  of  the  direction  of  arrival  (DOA) 
of  multiple  plane  waves  or  frequencies  of  sinusoids.  See 
for  example,  [1,  2,  7,  10,  11,  12,  14,  16,  23,  25,  28].  The 
problem  can  be  formulated  as  follows: 

x(t)  =  As(t)  +  n(£);  £  =  l,...,iV,  (1.1) 

where  x(t)  is  a  p  x  1  complex  valued  observation  vector, 
s(t)  is  the  q  x  1  complex  valued  unobserved  signal  vector 
and  n(t)  is  a  p  x  1  complex  valued  noise  vector  at  the 
time  point  t.  A  =  [  A(^i) . . .  A((f>q)]  is  a  px  q  matrix, 
where  A (</>*)  is  the  px  1  complex  valued  direction  vector 
of  the  kth  wavefront  and  parameterized  by  an  unknown 
parameter  vector  <£*,  associated  with  the  kth  signal.  We 
assume  that  s(t)  and  n(t)  are  complex  vectors  which  are 
distributed  independently  with  each  other.  The  p  x  q 
matrix  A  has  the  special  structure 

A  —  [ai , . . . ,  a9] 

and 

a*  =  a  (<£*)  =  [1,  e-^ e~iu°  (p-1)*‘]T,  (1.2) 

where  j  =  \/— 1,  4>k  =  C-1  Asm(0*),  C  -  speed  of  prop- 
agation,  0*  is  the  direction  of  arrival  of  signal  from  the 
kth  source  and  A  is  the  inter-source  distance  (see  [17]). 
It  is  always  possible  to  take  u;0  to  be  unity,  without  loss 
of  generality  (see  [17]).  One  of  the  important  problem 
is  the  estimation  of  g,  the  number  of  signals  another  is 
the  estimation  of  <j>\ , . . . ,  </>q. 

Estimating  the  parameters  of  the  model  (1.1)  is  a 
very  important  problem  in  signal  processing.  This  is  the 
situation  in  sensor  array  processing  [5,  6,  10],  in  Har¬ 
monic  analysis  [15],  in  retrieving  the  poles  of  a  system 
from  natural  response  (Wax,  Schmidt  and  Kailath;  [24]) 
and  also  in  retrieving  overlapping  echoes  from  radar 
back  scatter  ([17]). 

Estimation  of  <j>i , . . . ,  <j>q  assuming  q  known  is  usu¬ 
ally  solved  by  some  eigen  decomposition  method. 

There  are  several  eigen-decomposition  methods  avail¬ 
able  in  the  literature,  for  example  MUSIC  [3,  22],  modi¬ 
fied  MUSIC  [12],  ESPRIT,  TLS-ESPRIT  ([20]),  GEESE 


([17]),  Bai  and  Rao  method  [1]  ,  the  modified  Bai  and 
Rao  method  [14]  or  the  method  proposed  by  Kannan, 
Kundu  and  Mitra  [8].  •  For  detailed  discussions  of  the 
different  eigen-decomposition  methods  the  readers  are 
referred  to  the  Ph.D.  thesis  of  Kannan  [7]  or  the  re¬ 
view  article  of  Paulraj  et  ai  [16].  MUSIC  and  modi¬ 
fied  MUSIC  algorithms  are  obtained  by  minimizing  the 
Hermitian  form  of  an  exponential  function.  The  solu¬ 
tion  is  obtained  by  a  search  procedure  which  is  itera¬ 
tive  in  nature.  The  other  decomposition  methods  like 
ESPRIT,  TLS-ESPRIT,  GEESE,  Kannan,  Kundu  and 
Mitra  method,  Bai  and  Rao  method  or  the  modified 
Bai  and  Rao  method  are  all  non  iterative  in  nature.  It 
is  observed  that  among  several  non  iterative  methods 
centro  symmetric  modified  Bai  and  Rao  method  works 
very  well. 

The  estimation  of  q  was  attempted  by  many  re¬ 
searchers.  Wax  and  Kailath  [23]  and  Zhao,  Krishnaiah 
and  Bai  [28]  study  this  problem  from  the  parametric 
point  of  view,  where  as  Bhandari  and  Bansal  [2]  and 
Yin  and  Krishnaiah  [27]  study  this  problem  from  the 
Bayesian  and  non  parametric  point  of  view  respectively. 
Comparison  and  comments  on  the  different  methods 
can  be  found  in  [11]. 

Estimation  of  the  number  of  signals  and  the  perfor¬ 
mance  analysis  of  a  related  model  can  be  found  in  [19, 
22,  9,  13].  In  all  the  methods,  i.e .  in  estimation  and  in 
performance  analysis  computation,  it  is  assumed  that 
the  signal  random  variable  s(t)  and  the  noise  random 
variable  n(t)  are  Gaussian  random  variables  and  it  is 
not  very  easy  to  relax  this  assumption. 

The  main  aim  of  this  paper  is  to  provide  a  consis¬ 
tent  method  of  estimation  of  q  of  the  model  (1.1)  and 
carry  out  the  performance  analysis  without  assuming 
that  s(t)  and  n(t)  are  Gaussian  random  variables.  We 
assume  the  following 

E(s(t))  =  0,  E(s(t)s(t)H)  =  >  0,  (1.3) 

E(n(t))  =  0,  E(n(t)n(t)ff)  =  o*I, 

s(t)  and  n(t)  are  independent  and  (s(t),  n(t));  for  t  = 
l,...,iV  are  independent  and  identically  distributed 
random  variables.  Here  I  is  the  identity  matrix  of  or¬ 
der  p  x  p  and  ‘H’  denotes  the  conjugate  transpose  of 
a  matrix  or  a  vector.  In  developing  the  procedure, 
we  use  the  information  theoretic  criteria.  But  not  any 
fixed  penalty  function  has  been  used  like  AIC  or  MDL, 
but  a  class  of  penalty  functions  satisfying  some  special 
properties  like  EDC  of  [28]  has  been  used.  Unlike  [28] 
proving  the  strong  consistency  we  don’t  need  any  dis¬ 
tributional  assumption  of  s(t)  or  n(t).  We  carry  out 
the  performance  analysis  of  the  proposed  method  using 
the  matrix  perturbation  technique  and  large  sample  ap¬ 
proximation.  We  compute  the  probability  of  wrong  de¬ 
tection  for  large  sample  size  for  any  particular  penalty 


function  in  that  given  class  and  that  gives  some  idea 
which  penalty  function  should  be  used  for  any  particu¬ 
lar  sample. 

The  organization  of  the  rest  of  the  paper  is  as  fol¬ 
lows.  We  develop  the  method  in  Section  2  and  the 
strong  consistency  results  are  provided  in  Section  3. 
The  performance  analysis  is  carried  out  in  Section  4 
and  some  numerical  results  are  reported  in  Section  5. 
The  choice  of  the  penalty  function  is  suggested  in  Sec¬ 
tion  6.  In  Section  7  we  address  the  problem  if  the  error 
is  known  to  be  Gaussian  and  finally  we  draw  conclu¬ 
sions  from  our  work  in  Section  8. 


2.  SOME  NOTATIONS  AND  ESTIMATION 
PROCEDURE: 

We  use  the  following  notations  through  out  the  pa¬ 
per.  Let  R  be  the  variance  covariance  matrix,  i.e. 

R  =  E(x(t)x(t)H)  =  A^Ah  +  <r2I  (2.1) 

and  R  be  the  sample  variance  covariance  matrix,  i.e. 

(2.2) 

J  *=i 

Although  R  depends  on  iV,  for  brevity  we  are  not  mak¬ 
ing  it  explicit.  We  can  write 

R  =  R+(R-R).  (2.3) 

Here  (R-R)  denotes  the  perturbation  of  the  matrix  R 
and  clearly  some  norm  of  the  perturbation  matrix  goes 
to  zero  as  N  tends  to  infinity.  Let’s  denote  the  spectral 
decomposition  of  the  matrix  R  be  as  follows 

p 

R  =  X)  5  *(1)  >  •  •  •  >  \q)  >  *(9+1) 

t=l 

=  ■  •  •  =  A(p)  =  ct2  .  (2.4) 

Here  A(i)’s  are  the  eigenvalues  and  Z^’s  are  correspond¬ 
ing  orthonormal  eigenvectors  of  R.  Note  that  A(i), . . . , 
A(p)  denote  the  ordered  eigenvalues  of  R.  Let  the  cor¬ 
responding  spectral  decomposition  of  R  be  as  follows: 

p 

R  =  ^A(i)ZiZf,  A(X)  >  ...  >  A(p),  (2.5) 

i=  1 

here  A(£)’s  are  the  eigen  values  and  Z*’s  are  the  cor¬ 
responding  orthonormal  eigen  vectors  of  R.  Similarly 
A(x), . . . ,  A(p)  are  the  ordered  eigen  values  of  R.  Con¬ 
sider  the  following  function 

IC(k,  Cm)  =  A  (fc+1)  +  Wn]  k  =  0, 1, . .  .p  —  1, 

(2.6) 


here,  Cn  satisfies  the  following  conditions 


for  k  =  0, 1, . . .  —  l,q  -f  1, . . .  —  1.  Consider  two 

different  cases 


(a)  Cm  >  0  (»C*-0 

(2.7) 

Let 

g  =  arg  min  JC(fc,  CV),  for  0  <  A;  <  p  -  1,  (2.8) 

then  q  is  an  estimator  of  q .  Clearly  q  is  a  function  of 
N  and  Cn,  but  we  are  not  making  it  explicit  for  nota- 
tional  convenience.  In  the  next  section  we  prove  that 
q  is  a  consistent  estimator  of  q  if  (s(t),  n(t))  satisfies 
assumption  (1.3)  and  Cn  satisfies  (2.7).  In  the  subse¬ 
quent  section  we  suggest  how  to  choose  Cn- 

3.  CONSISTENCY  RESULTS: 


Case  1:  k  <  g, 

IC(qtC*)-IC(k9CN) 

-  A(7+i)  -  A(*+i)  +  {q  -  k)CN 

=  A(g+ 1)  -  A(fc+i)  +  {q-  k)CN  +  O  ^°9l°9N^j 

=  -7 (k+1)H<-k)cN+0(^y. 

(3.5) 

Since  the  second  and  the  third  term  of  (3.5)  go  to  zero 
as  N  tends  to  infinity  and  since  7 (k  +  1)  >  0,  implies 
for  large  TV, 


We  need  the  following  lemma  for  further  develop¬ 
ment. 

Lemma  Is  Let  P  =  (( Pij ))  and  Q  =  ((Qij))  be  two 
m  x  m  Hermitian  matrices  with  the  following  spectral 
decompositions 

m  m 

p  =  q = 

t=i  t=i 

(3.1) 

where  Si  >  . . .  >  Sm  and  /xi  >  . . .  >  /im 
and  {ui , . . . ,  um}  and  {vx . . .  vm}  are  the  orthonormal 
set  of  eigenvectors  of  P  and  Q  respectively.  If  there 
exists  an  a,  such  that  | -  Qij  \  <  a  for  all  i,j  = 

1. .  . . ,  m,  then  there  exists  a  C  such  that  —  /if|  <  Ca 
for  all  i  =  1, . .  .m. 

Proof:  The  proof  mainly  follows  from  von  Neumann 
inequality  but  see  also  [1]  for  details. 

By  the  law  of  iterated  logarithm,  we  can  say  that 
R  =  K  +  0(!^.y  (3.2) 

If  we  denote  the  non  zero  eigenvalues  of  A^tAH  as 

71. .  . . ,  7 q  (order  them  as  7^)  >  . . .  >  7(9))  then  it  is 
immediate  that 

A(i)  =  7(0  +  ff2  for  i  =  l,...q 

A  (f)  =  a1  for  i  =  <7  +  1, . . .  ,p. 

(3.3) 

Observe  that  to  prove  q  is  a  strongly  consistent  estima¬ 
tor  of  <7,  it  is  enough  to  prove  that  for  large  iV, 

(3.4) 


IC{q,  CN)  -  IC{k,  CN)<  0  a.s..  (3.6) 

case  II  q  <  k  IC(q,CN )  —  IC(k,CN) 

=  A(9+i)  -  A(jfe+i)  +  (q  -  k)CN 

=  (q~k)CN+0  (*2^)*  . 

Therefore 

(IC(q,CN)  -  IC(ktCN) 

CN 

=  (9.,)  +  J_0(!£^)‘.  ,3.7, 

Now  by  the  properties  of  Cn,  the  second  term  on  the 
right  hand  side  goes  to  zero.  Since  Cn  >  0,  therefore 
for  large  N 

IC(q,  Cn)  —  IC(k,  Cn)  <  0  a.s.  (3.8) 

for  q  <  k.  Combining  (3.6)  and  (3.8),  we  obtain  (3.4) 
and  that  proves  the  result. 


4.  PERFORMANCE  ANALYSIS 


In  this  section  we  obtain  the  bound  for  P{q  ^  q}  at 
least  for  large  N.  Note  that 

P{q*q}  =  YlP{q  =  k}+  £  P{q  =  k}.  (4.1) 

fc=0  k=q-\-l 

Consider  two  different  cases: 

Case  Ilf  k  <  q 

P{q  =  k} 

=  P{IC(k,CN)  <ICV,CN) 

for  j  =  0, 1, . . . ,  A:  —  1,  k  + 1, . . .  ,p  -  1} 


IC(q,CN)  —  IC(k,CN)  <  0  a.s. 


<  P{IC(k,CN)  <  IC(q,CN)} 

=  ^{A(*+i)  -  ^(9+1)  +  {k  -  q)Cn  <  0} 

=  -^{^(*+1)  -  A(g+i)  +  (k  -  Q)Cn 

<  (A(fc+1)  -  A(*+ 1)  +  A(?+i)  -  A(g+1))} 

=  P{l(k+1) 

<  (Q  -  tyCpt  +  (A(fc+1)  -  A(fc+1))  + 

A(g+1)  “  A(9+l)}- 

(4.2) 

Since  Cn  tends  to  zero,  (A(*+i)  —  A(^+i))  tends  to  zero, 
(A(9+i)  -  A (9+1))  tends  to  zero  and  7(*+i)  >  0>  «•*•> 
therefore  for  sufficiently  large  AT, 

P{q  =  k}  =  0  for  k  <  q.  (4.3) 

Case  II  If  k  >  q 

P{q  =  k}  =  P{IC(k1CN)  <  IC(j,CN) 

=  for  j  =  +  1} 

=  -  \j+i)  +  (k  -  j)CN  <  0 

for  j  =  0, 1,  — ,  Jb  —  1,  Jb  +  1,  —  ,p  —  1} 

Therefore,  by  the  same  argument  as  (4.3),  we  can  say 
that  for  large  Af, 

P(q  =  k)  =  P{X(j+1)-X(k+1)  +  o(^^y 
>  ( k-j)CN 

for  j  =  0,1,. . .,k  —  1,*  +  1,.. .  ,p-  1} 
=  P{\k+l)  -  A(j+1)  +  (k  -  j)CN  <  0 
for  j  =  q,...,k  —  l,k  +  l,...,p  —  l} 

<  P{\q+i)  ~  A(t+ 1)  >  (k  -  g)Cjv} 

since  A(,,+1)  -  A(fc+1)  >  0  for  j  =  0, 1, . . . ,  q  -  1.  So,  for 
large  N 

p-i 

P{q*q}*  £  pti  =  k)-  (4-4) 

k=q+l 

Therefore,  to  compute  (4.4),  we  need  to  know  the  joint 
distribution  of  A(X), . . . ,  A(p).  We  use  matrix  pertur¬ 
bation  technique  to  compute  the  joint  distribution  of 
A(i),  •  •  • ,  A(p).  Observe  that  from  the  Central  limit  the¬ 
orem,  we  can  say; 

VN(Vec(R)  -  Vec(R)) 

is  asymptotically  normal  with  mean  vector  0  and  cer¬ 
tain  p 2  x  p 2  dispersion  matrix  I\  Here  Vec(.)  of  a  p  x  p 
matrix  is  a  p2  x  1  vector  obtained  by  stacking  one  col¬ 
umn  below  another.  Let’s  write 

R  =  R-h(R-R)  =  R+Cjv - =  R+€jvBjv?  (4.5) 

eyv 


here  6jv  =  {loglogN/N}* ,  therefore  for  large  AT,  0  < 
cat  <  1  and  the  elements  of  Bat’s  are  bounded  almost 
surely  because  of  (3.2).  Let  A*  be  any  particular  eigen¬ 
value  of  R  and  Aj  be  the  corresponding  perturbed  eigen¬ 
value  of  R.  Suppose  Z i  is  the  normalized  eigenvector 
of  R  corresponding  to  A*,  then  from  [26],  we  have 

A{  w  A{  eTvZf^BArZ*.  (4*6) 

It  is  important  to  note  that  A*  may  be  repeated  eigen¬ 
value,  then  Zt  is  not  unique.  Take  any  particular  Z*, 
still  (4.6)  is  valid  ([26]).  Since  the  elements  of  Bat  are 
asymptotically  normally  distributed,  therefore  ZfBArZ* 
will  also  be  asymptotically  normally  distributed.  Clearly 
E(\i)  =  A*  for  i  =  1, . . .  ,p  as  E( Bn)  =  0  and 

E(Xi  -  Xi)(Xj  -  Xj)  =  4,E(Z(rB„Zi)(Z?Bf/Zj), 

(4.7) 

where  Z*  and  Z j  are  two  orthonormal  eigenvectors  cor¬ 
responding  to  A  i  and  A  j.  Now 

B(ZfBjVZ*)(ZfBJVZi) 

=  ^-E[Z^(Rn  -  R)ZiZf  (Rn  -  R)Zj] 

eN 

=  -^-(ZfRZi)(ZfRZi).  (4.8) 

Note  that  the  last  equality  of  (4.8)  follows  from  [4]. 
Therefore,  from  (4.8)  it  is  clear  that  A  *  will  be  asymp¬ 
totically  normally  distributed  with  mean  A*  and  vari- 
ance  -ft  fori  =  1  ,  ...,p.  Asymptotically  A*  and  A j 
are  independently  distributed  for  i  ^  j.  Note  that 
for  large  AT,  the  distribution  of  {\(q+i)y . . . ,  A(p)}  is  the 
distribution  of  order  statistics  of  (p  —  q)  random  ran¬ 
dom  sample  from  a  normal  distribution  with  mean  a2 
and  variance  Therefore,  to  compute  the  right  hand 
side  of  (4.4)  we  need  to  know  the  joint  distribution  of 
{A(*+i)  -  A(i+1)}  for  j  =  q . . .  ,p  -  1. 

It  is  well  known  that  the  exact  distribution  of  the 
difference  of  order  statistics  of  normal  distribution  are 
difficult  to  obtain.  Although  some  well  known  approxi¬ 
mations  results  are  available.  We  use  re-sampling  tech¬ 
nique  similarly  as  [13]  to  estimate  (4.4).  The  details 
will  be  presented  in  the  next  section. 


5.  NUMERICAL  EXPERIMENTS: 


In  this  section  we  perform  some  numerical  experi¬ 
ments  to  present  both  the  effectiveness  of  our  method 
and  the  usefulness  of  the  analysis.  All  the  computa¬ 
tions  are  performed  at  the  Pennsylvania  State  Univer¬ 
sity  using  SUN  workstation.  We  use  the  RAN2  uniform 
random  number  generator  of  [18]  and  the  singular  value 
decomposition  by  IMSL  subroutine.  The  programs  are 


written  in  FORTRAN.  It  is  available  on  request  from 
the  author.  We  consider  the  following  model. 

p  =  5,  q  =  2, 0i  =  1.0,  <f>2  =  2.0 

The  covariance  matrix  of  the  real  and  imaginary  part 
of  x(t)  is  a  2  x  2  matrix  as  follows. 

1.25  1.00' 

1.00  1.25 

We  consider  N  =  100.  The  real  and  imaginary  part 
of  x(t)  are  taken  to  be  independent.  For  the  com¬ 
parison  purposes  with  the  other  known  methods,  we 
consider  the  error  random  variables  to  be  normally  dis¬ 
tributed  with  a  —  0.75  (SNR  «  3.01dB),  a  =  1.0  (SNR 
«  .511dB)  and  a  =  1.125  (SNR  »  -.511dB). 

We  use  twelve  different  Cjv,  all  of  them  satisfying 
(2.7),  but  converging  to  zero  at  different  rates.  We  de¬ 
fine  them  as  P(  1), . . . ,  P(12),  they  are  as  follows  P(l)  = 
(ir)1)P(2)  =  (i)-2,P(3)  =  (i)'3, 

m  =  (£V4>  p®  =  m  =  (z^)2. 

P(7)  =  (lopiv)  >  =  (lojiv)  >  =  (iofliv)  > 

P(10)  =  (jvi^Tv)  ’  •f’(H)  =  ( vv/offiv)  > 

P(12)  =  (csfer)- 

Out  of  1000  replications,  the  percentage  of  correct  es¬ 
timates  (PCE),  and  the  percentage  of  wrong  estimates 
(PWE)  are  obtained  for  different  SNR.  We  also  obtain 
the  theoretical  values  of  the  probabilities  as  follows.  We 
draw  a  sample  of  size  (p  —  q)  from  a  Gaussian  random 
variable  with  mean  a1  and  variance  We  order  them 
as  A(9+i)  , . . . ,  A(p)  and  check  whether 

A(*+i)  -  Ay+i)  +  {k  -  j)CN  <  0 

for  j  =  q, . . . ,  k  —  1,  k  +  1, . . .  ,p.  We  repeat  the  pro¬ 
cess  five  thousand  times  and  compute  the  percentage  of 
time  it  is  true  and  that  gives  an  estimate  of  (4.4).  We 
also  calculate  the  percentage  of  under  estimates  from 
the  distributional  properties  of  A(i), . . . ,  \(qy  Adding 
the  probability  of  over  estimate  and  the  probability  of 
under  estimate  we  obtain  the  probability  of  wrong  es¬ 
timate.  Finally  subtracting  the  probability  of  wrong 
estimate  from  one  we  obtain  the  probability  of  correct 
estimate.  The  results  are  reported  in  Table  1.  The 
quantity  within  the  bracket  indicates  the  theoretical  es¬ 
timate  of  the  probability  of  correct  estimate  (PCE)  and 
the  probability  of  wrong  estimate  (PWE). 

At  finite  sample  size  the  performance  of  the  pro¬ 
posed  method  very  much  depends  on  the  penalty  func¬ 
tion  used,  although  all  of  them  give  consistent  estimates 
as  the  sample  size  tends  to  infinity.  From  Table  1  it  is 
clear  that  the  performances  of  all  the  methods  becomes 
worse  at  low  SNR  which  is  not  very  surprising.  It  is 


important  to  observe  that  the  theoretical  probabilities 
match  quite  well  in  almost  all  the  cases  considered  and 
the  estimates  are  better  in  most  of  the  cases  at  high 
SNR. 

Table  1 


a  =  .75 


P(k) 

PCE 

PWE 

P(l) 

P(2) 

P(3) 

P(4) 

P(5) 

P(6) 

P(7) 

P(8) 

P(9) 

P(10) 

P(ll) 

P(12) 

1.00  (1.00) 
0.95  (1.00) 
0.56  (0.74) 
0.12  (0.23) 
0.38  (0.56) 
1.00  (1.00) 
0.99  (1.00) 
0.96  (1.00) 
0.77  (0.89) 
1.00  (1.00) 
0.12  (0.23) 
1.00  (1.00) 

0.00  (0.00) 
0.05  (0.00) 
0.44  (0.26) 
0.88  (0.77) 
0.62  (0.44) 
0.00  (0.00) 
0.01  (0.00) 
0.04  (0.00) 
0.23  (0.11) 
0.00  (0.00) 
0.88  (0.77) 
0.00  (0.00) 

<j  =  1.00 


PCE 

PWE 

P(l) 

P(2) 

P(3) 

P(4) 

P(5) 

P(6) 

P(7) 

P(8) 

P(9) 

P(10) 

P(ll) 

P(12) 

0.92  (0.98) 
0.42  (0.61) 
0.07  (0.21) 
0.00  (0.11) 
0.03  (0.19) 
0.97  (1.00) 
0.81  (0.96) 
0.42  (0.60) 
0.15  (0.29) 
0.80  (0.96) 
0.00  (0.14) 
0.93  (0.99) 

0.08  (0.02) 
0.58  (0.39) 
0.93  (0.79) 
1.00  (0.89) 
0.97  (0.81) 
0.03  (0.00) 
0.19  (0.04) 
0.58  (0.40) 
0.85  (0.71) 
0.20  (0.04) 
1.00  (0.86) 
0.07  (0.01) 

a  =  1.125 


PCE 

PWE 

P(l) 

0.72  (0.88) 

0.28  (0.12) 

P(2) 

0.20  (0.43) 

0.80  (0.57) 

P(3) 

0.01  (0.21) 

0.99  (0.79) 

P(4) 

0.00  (0.19) 

1.00  (0.81) 

P(5) 

0.00  (0.22) 

1.00  (0.78) 

P(6) 

0.86  (0.96) 

0.14  (0.04) 

P(7) 

0.53  (0.71) 

0.47  (0.29) 

P(8) 

0.20  (0.41) 

0.80  (0.59) 

P(9) 

0.04  (0.25) 

0.96  (0.75) 

P(10) 

0.52  (0.70) 

0.48  (0.30) 

P(ll) 

0.00  (0.23) 

1.00  (0.77) 

P(12) 

0.75  (0.91) 

0.25  (0.09) 

6.  HOW  TO  CHOOSE  THE  PENALTY  FUNC¬ 
TION  ? 


Looking  at  the  tables,  it  is  clear  that  the  theoreti¬ 
cal  bounds  are  quite  close  to  the  actual  one.  But  un¬ 
fortunately  without  knowing  the  actual  parameters,  we 
can’t  calculate  the  theoretical  probabilities.  Nobody,  so 
far  did  raise  this  question  that  how  to  estimate  these 
bounds.  We  estimate  these  probabilities  with  the  help 
of  the  given  sample  and  using  re-sampling  technique. 
We  finally  use  them  to  choose  the  proper  penalty  func¬ 
tion,  which  definitely  depends  on  the  model  as  well  as 
the  given  sample.  From  any  particular  realization  of 
the  model,  we  compute  the  matrix  R  (see  (2.2))  and 
obtain  the  p  eigenvalues  and  the  corresponding  eigen¬ 
vectors.  Now  suppose  using  the  penalty  function  P(k ), 
we  estimate  the  order  of  the  model  as  M*.  Assuming 
Mfc  is  the  correct  order  model,  we  compute  the  esti¬ 
mate  of  cr2,  by  averaging  the  last  p  —  M *  eigenvalues, 
say  a2.  We  estimate  (4.4),  the  probability  of  over  es¬ 
timate,  assuming  a2  is  the  true  value  of  cr2,  M*  is  the 
correct  order  model  and  using  the  re-sampling  (simu¬ 
lation)  technique  as  described  in  the  previous  section 
by  generating  B  normal  random  sample  of  size  p  —  M*. 
Similarly  assuming  A(i), . . . ,  X(Mk)  are  the  true  values 
of  A(i),...,A (Mfc),  Mk  is  the  correct  order  model  and 
using  the  asymptotic  distribution  of  A^), . . . ,  \Mk)>  as 
obtained  in  section  4,  we  estimate  the  probability  of 
under  estimate.  Therefore,  adding  the  two  we  obtain 
an  estimate  of  the  probability  of  wrong  detection.  It 
can  be  shown  easily  that  for  large  N,  the  estimate  of 
probability  of  wrong  detection  under  the  assumptions 
of  correct  order  model  will  be  less  than  the  estimate  of 
wrong  detection  under  the  assumption  of  lower/  higher 
order  model,  because  the  former  one  goes  to  zero  as 
N  tends  to  infinity  where  as  the  later  one  goes  to  a 
positive  quantity. 


a 

PCE 

PWE 

0.75 

1.000 

0.000 

1.00 

0.920 

0.080 

1.125 

0.770 

0.230 

It  is  observed  from  Table  2  that  the  proposed  method 
works  quite  well.  As  the  SNR  increases  the  performance 
of  the  proposed  method  improves,  it  verifies  the  consis¬ 
tency  property  of  the  proposed  method.  If  the  SNR 
is  high  («  3dB)  then  the  proposed  method  can  detect 
all  the  times  the  correct  order  model.  As  the  SNR  de¬ 
creases  the  performance  becomes  bad.  Even  then  at 
low  SNR  («  -.5dB)  the  proposed  method  can  detect 
more  than  75  %  of  the  time  the  correct  order  model. 


7.  IF  THE  ERROR  IS  GAUSSIAN: 


So  far  we  did  not  use  any  distributional  assump¬ 
tions  on  the  error  random  variables  of  the  model  ex¬ 
cept  (1.3).  Zhao,  Krishnaiah  and  Bai  [28]  or  Wax  and 
Kailath  [23]  used  AIC,  MDL  or  EDC  type  criteria  if 
the  errors  are  known  to  be  Gaussian.  If  we  know  that 
the  errors  are  Gaussian,  we  should  use  that  information 
and  that  should  yield  better  results.  We  recommend  to 
use  the  modified  EDC  as  follows.  Suppose 


/(&,  Dyv)  =  -logLk  +  Dyv{fc(2p  —  k)  +  1},  (7.1) 


here 


Lk  = 


n?.,+1  v 


(^a^P'*’ 

and  Dn  satisfies  the  following  conditions: 

Dn 


(a)  lim  =  0  (b)  lim 

N-* oo  N  n~¥oq  loglogN 


=  00. 


(7.2) 


We  use  this  idea  and  compute  the  estimate  of  the 
probability  of  wrong  detection  for  all  the  criteria  and 
choose  that  one  which  gives  the  lowest  estimate  of  prob¬ 
ability  of  wrong  detection.  We  use  the  same  model,  and 
the  same  set  of  penalty  functions  and  in  each  trial  we 
choose  that  penalty  function  that  gives  the  lowest  es¬ 
timate  of  probability  of  wrong  detection.  In  each  case 
we  draw  one  hundred  random  sample  ( B  =  100)  to 
compute  the  probability  of  error  and  replicate  it  over 
one  thousand  trials.  The  result  is  reported  in  Table 
2,  which  indicates  the  percentage  of  correct  estimate 
(PCE)  and  the  percentage  of  wrong  estimate  (PWE) 
over  one  thousand  replications. 


Table  2 


Let 


q  =  argminltyiDfsr), 


then  q  is  an  estimate  of  q.  Zhao,  Krishnaiah  and  Bai 
[28]  proved  that  q  is  a  consistent  estimator  of  q.  Note 
that 

P[q^q)  =  l-P[q  =  q], 

where 

P[q  =  q] 

=  P[I(q,Dft)-I(k,Dff)  <  0,  for  k  =  0, . . .  ,q- 1,  q+ 

1]. 

(7.3) 

Since  (7.3)  depends  on  the  distribution  of  Ai,...,Ap, 
using  the  asymptotic  distribution  of  Ai, . . . ,  Ap,  we  can 
estimate  (7.3)  exactly  as  before  by  using  the  re-sampling 
technique.  From  a  class  of  Dyv,  we  can  estimate  (7.3) 
for  each  penalty  function  Dyv  and  choose  that  Dyv  which 


gives  the  smallest  estimated  probability  of  wrong  detec¬ 
tion.  We  use  the  same  model  of  Section  5,  and  use  the 
following  Dat,  defined  as  D(l), . . . ,  D(12)  satisfy  (7.2) 
(except  D(4)  =  1,  which  gives  AIC).  They  are 
D(  1)  =  AT*1,  D( 2)  =  AT*5,  D( 3)  =  A^9, 

D( 4)  =  1,  D( 5)  =  ^op(iV),  D(6)  =  (M^))\ 

^(7)  =  {log(N))’5,  D(8)  =  (/op(AT))*9, 

D(  9)  =  (AT/o^AT))1, 

,0(10)  =  (Nlog(N)y\ 

D(  11)  =  (. Nlog(N ))*9,  -0(12)  =  ^op(iV). 

Out  of  one  thousand  replications,  the  percentage  of 
correct  estimates  and  the  percentage  of  wrong  estimates 
are  reported  in  Table  3. 

Table  3 


<7  =  .75 


Method 

PCE 

PWE 

New 

0.999 

0.001 

AIC 

0.884 

0.116 

BIC 

0.999 

0.001 

EDC 

0.927 

0.073 

a 

=  1.00 

Method 

PCE 

PWE 

New 

0.990 

0.010 

AIC 

0.887 

0.113 

BIC 

0.982 

0.018 

EDC 

0.929 

0.071 

a  : 

=  1.125 

Method 

PCE 

PWE 

New 

0.973 

0.027 

AIC 

0.890 

0.110 

BIC 

0.864 

0.136 

EDC 

0.928 

0.072 

From  the  results  of  Table  3,  it  is  very  clear  that  the 
performances  of  BIC,  EDC  and  modified  EDC  improve 
as  the  SNR  increases.  It  again  verifies  the  consistency 
properties  of  the  three  methods.  On  the  other  hand  the 
inconsistency  of  the  AIC  also  verifies  from  the  results 
of  Table  3.  The  performance  of  BIC  is  generally  better 
than  AIC  or  EDC  if  the  SNR  is  moderate  ( a  =  1.00) 
or  high  (a  =  .75)  but  the  performance  of  BIC  becomes 
quite  bad  compared  to  AIC  or  EDC  at  low  SNR  (a 
=  1.25).  The  same  phenomena  were  observed  in  [11] 
also.  Now  comparing  the  modified  EDC  with  the  rest 
it  is  clear  that  the  modified  EDC  performs  much  better 
than  the  other  known  methods  at  all  SNR.  It  may  not 
be  very  surprising,  because  AIC,  BIC  and  EDC  use 
data  independent  penalty  functions.  On  the  other  hand 
the  modified  EDC  use  data  dependent  penalty  function. 
It  uses  that  penalty  function,  which  is  in  some  sense 


optimal  for  that  given  data  set  within  that  given  class 
of  penalty  functions. 


8.  CONCLUSIONS: 


In  this  paper  we  consider  the  direction  of  arrival 
model  and  propose  a  new  method  to  estimate  the  num¬ 
ber  of  signals.  We  do  not  need  any  distributional  as¬ 
sumptions  on  the  error  random  variables,  except  the 
finiteness  of  the  second  order  moment.  It  is  well  known 
that  if  the  error  are  Gaussian,  then  we  can  use  the 
AIC,  MDL  or  EDC  to  estimate  the  number  of  signals. 
We  propose  a  modified  criteria,  if  it  is  known  that  the 
errors  are  Gaussian.  It  is  observed  that  our  proposed 
method  works  better  than  the  usual  AIC,  MDL  or  EDC 
in  many  situations.  It  may  be  mentioned  that  the  pro¬ 
posed  method  may  be  difficult  for  online  implementa¬ 
tion.  More  work  is  needed  in  that  direction. 
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